12 research outputs found
ReBNet: Residual Binarized Neural Network
This paper proposes ReBNet, an end-to-end framework for training
reconfigurable binary neural networks on software and developing efficient
accelerators for execution on FPGA. Binary neural networks offer an intriguing
opportunity for deploying large-scale deep learning models on
resource-constrained devices. Binarization reduces the memory footprint and
replaces the power-hungry matrix-multiplication with light-weight XnorPopcount
operations. However, binary networks suffer from a degraded accuracy compared
to their fixed-point counterparts. We show that the state-of-the-art methods
for optimizing binary networks accuracy, significantly increase the
implementation cost and complexity. To compensate for the degraded accuracy
while adhering to the simplicity of binary networks, we devise the first
reconfigurable scheme that can adjust the classification accuracy based on the
application. Our proposition improves the classification accuracy by
representing features with multiple levels of residual binarization. Unlike
previous methods, our approach does not exacerbate the area cost of the
hardware accelerator. Instead, it provides a tradeoff between throughput and
accuracy while the area overhead of multi-level binarization is negligible.Comment: To Appear In The 26th IEEE International Symposium on
Field-Programmable Custom Computing Machine
XONN: XNOR-based Oblivious Deep Neural Network Inference
Advancements in deep learning enable cloud servers to provide
inference-as-a-service for clients. In this scenario, clients send their raw
data to the server to run the deep learning model and send back the results.
One standing challenge in this setting is to ensure the privacy of the clients'
sensitive data. Oblivious inference is the task of running the neural network
on the client's input without disclosing the input or the result to the server.
This paper introduces XONN, a novel end-to-end framework based on Yao's Garbled
Circuits (GC) protocol, that provides a paradigm shift in the conceptual and
practical realization of oblivious inference. In XONN, the costly
matrix-multiplication operations of the deep learning model are replaced with
XNOR operations that are essentially free in GC. We further provide a novel
algorithm that customizes the neural network such that the runtime of the GC
protocol is minimized without sacrificing the inference accuracy.
We design a user-friendly high-level API for XONN, allowing expression of the
deep learning model architecture in an unprecedented level of abstraction.
Extensive proof-of-concept evaluation on various neural network architectures
demonstrates that XONN outperforms prior art such as Gazelle (USENIX
Security'18) by up to 7x, MiniONN (ACM CCS'17) by 93x, and SecureML (IEEE
S&P'17) by 37x. State-of-the-art frameworks require one round of interaction
between the client and the server for each layer of the neural network,
whereas, XONN requires a constant round of interactions for any number of
layers in the model. XONN is first to perform oblivious inference on Fitnet
architectures with up to 21 layers, suggesting a new level of scalability
compared with state-of-the-art. Moreover, we evaluate XONN on four datasets to
perform privacy-preserving medical diagnosis.Comment: To appear in USENIX Security 201
Improving vision-inspired keyword spotting using dynamic module skipping in streaming conformer encoder
Using a vision-inspired keyword spotting framework, we propose an
architecture with input-dependent dynamic depth capable of processing streaming
audio. Specifically, we extend a conformer encoder with trainable binary gates
that allow us to dynamically skip network modules according to the input audio.
Our approach improves detection and localization accuracy on continuous speech
using Librispeech top-1000 most frequent words while maintaining a small memory
footprint. The inclusion of gates also reduces the average amount of processing
without affecting the overall performance. These benefits are shown to be even
more pronounced using the Google speech commands dataset placed over background
noise where up to 97% of the processing is skipped on non-speech inputs,
therefore making our method particularly interesting for an always-on keyword
spotter
HEiMDaL: Highly Efficient Method for Detection and Localization of wake-words
Streaming keyword spotting is a widely used solution for activating voice
assistants. Deep Neural Networks with Hidden Markov Model (DNN-HMM) based
methods have proven to be efficient and widely adopted in this space, primarily
because of the ability to detect and identify the start and end of the wake-up
word at low compute cost. However, such hybrid systems suffer from loss metric
mismatch when the DNN and HMM are trained independently. Sequence
discriminative training cannot fully mitigate the loss-metric mismatch due to
the inherent Markovian style of the operation. We propose an low footprint CNN
model, called HEiMDaL, to detect and localize keywords in streaming conditions.
We introduce an alignment-based classification loss to detect the occurrence of
the keyword along with an offset loss to predict the start of the keyword.
HEiMDaL shows 73% reduction in detection metrics along with equivalent
localization accuracy and with the same memory footprint as existing DNN-HMM
style models for a given wake-word
GeneCAI: Genetic Evolution for Acquiring Compact AI
In the contemporary big data realm, Deep Neural Networks (DNNs) are evolving
towards more complex architectures to achieve higher inference accuracy. Model
compression techniques can be leveraged to efficiently deploy such
compute-intensive architectures on resource-limited mobile devices. Such
methods comprise various hyper-parameters that require per-layer customization
to ensure high accuracy. Choosing such hyper-parameters is cumbersome as the
pertinent search space grows exponentially with model layers. This paper
introduces GeneCAI, a novel optimization method that automatically learns how
to tune per-layer compression hyper-parameters. We devise a bijective
translation scheme that encodes compressed DNNs to the genotype space. The
optimality of each genotype is measured using a multi-objective score based on
accuracy and number of floating point operations. We develop customized genetic
operations to iteratively evolve the non-dominated solutions towards the
optimal Pareto front, thus, capturing the optimal trade-off between model
accuracy and complexity. GeneCAI optimization method is highly scalable and can
achieve a near-linear performance boost on distributed multi-GPU platforms. Our
extensive evaluations demonstrate that GeneCAI outperforms existing rule-based
and reinforcement learning methods in DNN compression by finding models that
lie on a better accuracy-complexity Pareto curve
Recommended from our members
End-to-end Customization of Efficient, Private, and Robust Neural Networks
Advancements in machine learning (ML) algorithms, data acquisition platforms, and high-end computer architectures have fueled an unprecedented industrial automation. An ML algorithm captures the dynamics of a task by learning an abstract model from domain-specific data. Once the model is trained by the ML algorithm, it can perform the underlying task with relatively high accuracy. This thesis is specifically focused on Deep Neural Networks (DNNs), a modern class of ML models that have shown promising performance in various applications. Thanks to DNNs, the breadth of automation has been expanded to tasks that were formerly too complex to be performed by computers; nowadays DNNs establish the foundation of applications such as voice recognition, medical image analysis, face authentication, to name a few.Despite DNNs' benefits, their deployment in real-world applications may be circumscribed by several factors. First, DNNs are computationally complex and their efficient execution on resource-constrained edge devices is a critical challenge. Second, users of DNN-based applications are often required to expose their data to the service provider, which may violate their privacy. Third, DNN models may fail to function correctly in the presence of malicious attackers. Having the aforementioned challenges in mind, it is a paramount challenge to design DNN-based systems that are efficient to execute, ensure users' privacy, and are robust to malicious attacks.This dissertation provides holistic customization techniques that pave the way for efficient, private, and robust DNN inference. The key contributions of the thesis are as follows:Efficiency: Development of encoded DNNs, a new family of memory-efficient neural networks. The thesis author's contributions provide customization techniques that enable incorporation of nonlinear encoding to the computation flow of neural networks. An end-to-end framework is introduced to facilitate encoding, bitwidth customization, fine-tuning, and implementation of neural networks on FPGA platforms.
Efficiency: Introducing the concept of lookup-table based execution of encoded neural networks. The proposed method replaces floating-point multiplications with look-up table search. A memory-based hardware architecture is then proposed to execute the lookup-based multiplications and accelerate encoded DNN inference.
Privacy: Establishing customized solutions for oblivious inference, where a client holds a data sample and a server holds a DNN model. After running the oblivious inference protocol, the client receives the inference result without revealing her input to the server. This thesis proposes automated customization solutions to speed up the oblivious inference while maintaining a high inference accuracy. Robustness: Development of solutions for online detection of neural Trojan triggers, a class of malicious attacks that cause a DNN to perform faulty inferences. The thesis proposes a novel methodology that enhances robustness to Trojan attacks by leveraging dictionary learning and sparse approximation